54 research outputs found
Numerical studies of space filling designs: optimization of Latin Hypercube Samples and subprojection properties
International audienceQuantitative assessment of the uncertainties tainting the results of computer simulations is nowadays a major topic of interest in both industrial and scientific communities. One of the key issues in such studies is to get information about the output when the numerical simulations are expensive to run. This paper considers the problem of exploring the whole space of variations of the computer model input variables in the context of a large dimensional exploration space. Various properties of space filling designs are justified: interpoint-distance, discrepancy, minimum spanning tree criteria. A specific class of design, the optimized Latin Hypercube Sample, is considered. Several optimization algorithms, coming from the literature, are studied in terms of convergence speed, robustness to subprojection and space filling properties of the resulting design. Some recommendations for building such designs are given. Finally, another contribution of this paper is the deep analysis of the space filling properties of the design 2D-subprojections
Reconnaissance d'opérations d'algèbre linéaire dans un programme polyédrique
Writing a code which uses an architecture at its full capability has become an increasingly difficult problem over the last years. For some key operations, a dedicated accelerator or a finely tuned implementation exists and delivers the best performance. Thus, when compiling a code, identifying these operations and issuing calls to their high-performance implementation is attractive. In this dissertation, we focus on the problem of detection of these operations. We propose a framework which detects linear algebra subcomputations within a polyhedral program. The main idea of this framework is to partition the computation in order to isolate different subcomputations in a regular manner, then we consider each portion of the computation and try to recognize it as a combination of linear algebra operations.We perform the partitioning of the computation by using a program transformation called monoparametric tiling. This transformation partitions the computation into blocks, whose shape is some homothetic scaling of a fixed-size partitioning. We show that the tiled program remains polyhedral while allowing a limited amount of parametrization: a single size parameter. This is an improvement compared to the previous work on tiling, that forced us to choose between these two properties.Then, in order to recognize computations, we introduce a template recognition algorithm. This template recognition algorithm is built on a state-of-the-art program equivalence algorithm. We also propose several extensions in order to manage some semantic properties.Finally, we combine these two previous contributions into a framework which detects linear algebra subcomputations. A part of this framework is a library of template, based on the BLAS specification. We demonstrate our framework on several applications.Durant ces dernières années, Il est de plus en plus compliqué d'écrire du code qui utilise une architecture au mieux de ses capacités. Certaines opérations clefs ont soit un accélérateur dédié, ou admettent une implémentation finement optimisée qui délivre les meilleurs performances. Ainsi, il est intéressant d'identifier ces opérations pendant la compilation d'un programme, et de faire appel à une implémentation optimisée.Nous nous intéressons dans cette thèse au problème de détection de ces opérations. Nous proposons un procédé qui détecte des sous-calculs correspondant à des opérations d'algèbre linéaire à l'intérieur de programmes polyédriques. L'idée principale de ce procédé est de découper le programme en sous-calculs isolés, et essayer de reconnaître chaque sous-calculs comme une combinaison d'opérateurs d'algèbre linéaire.Le découpage du calcul est effectué en utilisant une transformation de programme appelée tuilage monoparamétrique. Cette transformation partitionne le calcul en tuiles dont la forme est un agrandissement paramétrique d'une tuile de taille constante. Nous montrons que le programme tuilé reste polyédrique tout en permettant une paramétrisation limitée des tailles de tuile. Les travaux précédents sur le tuilage nous forçaient à choisir l'une de ces deux propriétés.Ensuite, afin d'identifier les opérateurs, nous introduisons un algorithme de reconnaissance de template, qui est une extension d'un algorithme d'équivalence de programme. Nous proposons plusieurs extensions afin de tenir compte des propriétés sémantiques communément rencontrées en algèbre linéaire.Enfin, nous combinons les deux contributions précédentes en un procédé qui détecte les sous-calculs correspondant à des opérateurs d'algèbre linéaire. Une de ses composantes est une librairie de template, inspirée de la spécification BLAS. Nous démontrons l'efficacité de notre procédé sur plusieurs applications
Numerical computation of travelling breathers in Klein-Gordon chains
We numerically study the existence of travelling breathers in Klein-Gordon
chains, which consist of one-dimensional networks of nonlinear oscillators in
an anharmonic on-site potential, linearly coupled to their nearest neighbors.
Travelling breathers are spatially localized solutions having the property of
being exactly translated by sites along the chain after a fixed propagation
time (these solutions generalize the concept of solitary waves for which
). In the case of even on-site potentials, the existence of small
amplitude travelling breathers superposed on a small oscillatory tail has been
proved recently (G. James and Y. Sire, to appear in {\sl Comm. Math. Phys.},
2004), the tail being exponentially small with respect to the central
oscillation size. In this paper we compute these solutions numerically and
continue them into the large amplitude regime for different types of even
potentials. We find that Klein-Gordon chains can support highly localized
travelling breather solutions superposed on an oscillatory tail. We provide
examples where the tail can be made very small and is difficult to detect at
the scale of central oscillations. In addition we numerically observe the
existence of these solutions in the case of non even potentials
Le tuilage mono-paramétrique est une transformation polyédrique
Tiling is a crucial program transformation with many benefits: it improves locality, exposes parallelism, allows for adjusting the ops-to-bytes balance of codes, and can be applied at multiple levels. Allowing tile sizes to be symbolic parameters at compile time has many benefits, including efficient autotuning, and run-time adaptability to system variations. For polyhedral programs, parametric tiling in its full generality is known to be non-linear, breaking the mathematical closure properties of the polyhedral model. Most compilation tools therefore either avoid it by only performing fixed size tiling, or apply it in only the final, code generation step. Both strategies have limitations. We first introduce mono-parametric partitioning, a restricted parametric, tiling-like transformation which can be used to express a tiling. We show that, despite being parametric, it is a polyhedral transformation. We first prove that applying mono-parametric partitioning (i) to a polyhedron yields a union of polyhedra, and (ii) to an affine function produces a piecewise-affine function. We then use these properties to show how to partition an entire polyhedral program, including one with reductions. Next, we generalize this transformation to tiles with arbitrary tile shapes that can tesselate the iteration space (e.g., hexagonal, trapezoidal, etc). We show how mono-parametric tiling can be applied at multiple levels, and enables a wide range of polyhedral analysis and transformations to be applied
From micro-OPs to abstract resources: constructing a simpler CPU performance model through microbenchmarking
This paper describes PALMED, a tool that automatically builds a resource
mapping, a performance model for pipelined, super-scalar, out-of-order CPU
architectures. Resource mappings describe the execution of a program by
assigning instructions in the program to abstract resources. They can be used
to predict the throughput of basic blocks or as a machine model for the backend
of an optimizing compiler. PALMED does not require hardware performance
counters, and relies solely on runtime measurements to construct resource
mappings. This allows it to model not only execution port usage, but also other
limiting resources, such as the frontend or the reorder buffer. Also, thanks to
a dual representation of resource mappings, our algorithm for constructing
mappings scales to large instruction sets, like that of x86. We evaluate the
algorithmic contribution of the paper in two ways. First by showing that our
approach can reverse engineering an accurate resource mapping from an
idealistic performance model produced by an existing port-mapping. We also
evaluate the pertinence of our dual representation, as opposed to the standard
port-mapping, for throughput modeling by extracting a representative set of
basic-blocks from the compiled binaries of the Spec CPU 2017 benchmarks and
comparing the throughput predicted by existing machine models to that produced
by PALMED
Bifurcations of discrete breathers in a diatomic Fermi-Pasta-Ulam chain
Discrete breathers are time-periodic, spatially localized solutions of the
equations of motion for a system of classical degrees of freedom interacting on
a lattice. Such solutions are investigated for a diatomic Fermi-Pasta-Ulam
chain, i. e., a chain of alternate heavy and light masses coupled by anharmonic
forces. For hard interaction potentials, discrete breathers in this model are
known to exist either as ``optic breathers'' with frequencies above the optic
band, or as ``acoustic breathers'' with frequencies in the gap between the
acoustic and the optic band. In this paper, bifurcations between different
types of discrete breathers are found numerically, with the mass ratio m and
the breather frequency omega as bifurcation parameters. We identify a period
tripling bifurcation around optic breathers, which leads to new breather
solutions with frequencies in the gap, and a second local bifurcation around
acoustic breathers. These results provide new breather solutions of the FPU
system which interpolate between the classical acoustic and optic modes. The
two bifurcation lines originate from a particular ``corner'' in parameter space
(omega,m). As parameters lie near this corner, we prove by means of a center
manifold reduction that small amplitude solutions can be described by a
four-dimensional reversible map. This allows us to derive formally a continuum
limit differential equation which characterizes at leading order the
numerically observed bifurcations.Comment: 30 pages, 10 figure
PALMED: Throughput Characterization for Superscalar Architectures
International audienceIn a super-scalar architecture, the scheduler dynamically assigns micro-operations (µOPs) to execution ports. The port mapping of an architecture describes how an instruction decomposes into µOPs and lists for each µOP the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop. This paper introduces a dual equivalent representation: The resource mapping of an architecture is an abstract model where, to be executed, an instruction must use a set of abstract resources, themselves representing combinations of execution ports. For a given architecture, finding a port mapping is an important but difficult problem. Building a resource mapping is a more tractable problem and provides a simpler and equivalent model. This paper describes Palmed, a tool that automatically builds a resource mapping for pipelined, super-scalar, out-of-order CPU architectures. Palmed does not require hardware performance counters, and relies solely on runtime measurements. We evaluate the pertinence of our dual representation for throughput modeling by extracting a representative set of basic-blocks from the compiled binaries of the SPEC CPU 2017 benchmarks. We compared the throughput predicted by existing machine models to that produced by Palmed, and found comparable accuracy to state-of-the art tools, achieving sub-10 % mean square error rate on this workload on Intel's Skylake microarchitecture
Automatic Parallelization from Lustre Models in Avionics
International audienceThis poster presents ongoing research on automatic generation and execution of embedded parallel C code. We target safety-critical avionics programs specified in the synchronous language Lustre. The work described is part of the ITEA 3 project ASSUME (September 2015 - August 2018). ASSUME focuses mainly on embedded software engineering for multi-/many-core platforms. Both synthesis, e.g., automatic code generation, and verification, e.g., static analysis, of programs are addressed in the project. ASSUME is driven by the use cases of its industrial partners. One of these use cases consists in the parallelization of an avionics application comprising about 5500 Lustre nodes. After an overview of the ASSUME project, both parallel code generation and execution on a many-core platform will be presented and demonstrated
- …